45 research outputs found

    A probabilistic model for gene content evolution with duplication, loss, and horizontal transfer

    Full text link
    We introduce a Markov model for the evolution of a gene family along a phylogeny. The model includes parameters for the rates of horizontal gene transfer, gene duplication, and gene loss, in addition to branch lengths in the phylogeny. The likelihood for the changes in the size of a gene family across different organisms can be calculated in O(N+hM^2) time and O(N+M^2) space, where N is the number of organisms, hh is the height of the phylogeny, and M is the sum of family sizes. We apply the model to the evolution of gene content in Preoteobacteria using the gene families in the COG (Clusters of Orthologous Groups) database

    A Model of Problem Solving Environment for Integrated Bioinformatics Solution on Grid by Using Condor

    No full text
    Abstract. To solve the real-world bioinformatics problems on grid, the integration of various analysis tools is necessary in addition to the imple-mentation of basic tools. Workflow based problem solving environment on grid can be the efficient solution for this type of software development. Here we propose a model of simple problem solving environment that enables component based workflow design of integrated bioinformatics applications on Grid environment by using Condor functionalities.

    Metabolism and evolution of Haemophilus influenzae deduced from a whole genome comparison with Escherichia coli

    Get PDF
    BACKGROUND: The 1.83 Megabase (Mb) sequence of the Haemophilus influenzae chromosome, the first completed genome sequence of a cellular life form, has been recently reported. Approximately 75 % of the 4.7 Mb genome sequence of Escherichia coli is also available. The life styles of the two bacteria are very different - H. influenzae is an obligate parasite that lives in human upper respiratory mucosa and can be cultivated only on rich media, whereas E. coli is a saprophyte that can grow on minimal media. A detailed comparison of the protein products encoded by these two genomes is expected to provide valuable insights into bacterial cell physiology and genome evolution. RESULTS: We describe the results of computer analysis of the amino-acid sequences of 1703 putative proteins encoded by the complete genome of H. influenzae. We detected sequence similarity to proteins in current databases for 92 % of the H. influenzae protein sequences, and at least a general functional prediction was possible for 83 %. A comparison of the H. influenzae protein sequences with those of 3010 proteins encoded by the sequenced 75 % of the E. coli genome revealed 1128 pairs of apparent orthologs, with an average of 59 % identity. In contrast to the high similarity between orthologs, the genome organization and the functional repertoire of genes in the two bacteria were remarkably different. The smaller genome size of H. influenzae is explained, to a large extent, by a reduction in the number of paralogous genes. There was no long range colinearity between the E. coli and H. influenzae gene orders, but over 70 % of the orthologous genes were found in short conserved strings, only about half of which were operons in E. coli. Superposition of the H. influenzae enzyme repertoire upon the known E. coli metabolic pathways allowed us to reconstruct similar and alternative pathways in H. influenzae and provides an explanation for the known nutritional requirements. CONCLUSIONS: By comparing proteins encoded by the two bacterial genomes, we have shown that extensive gene shuffling and variation in the extent of gene paralogy are major trends in bacterial evolution; this comparison has also allowed us to deduce crucial aspects of the largely uncharacterized metabolism of H. influenzae

    A Fixed-Parameter Algorithm for Minimum Common String Partition with Few Duplications

    No full text
    Abstract. Motivated by the study of genome rearrangements, the NPhard Minimum Common String Partition problems asks, given two strings, to split both strings into an identical set of blocks. We consider an extension of this problem to unbalanced strings, so that some elements may not be covered by any block. We present an efficient fixed-parameter algorithm for the parameters number k of blocks and maximum occurrence d of a letter in either string. We then evaluate this algorithm on bacteria genomes and synthetic data.
    corecore